#get this bot some language modules stat | Explore Tumblr posts and blogs

365daysofquant · 4 years ago

Text

#8 Stat Arb plus Update

Date: 22 Aug 2021

It’s been a while and I’ve been slacking off already but I have reasons.

I got rejected from that quant internship. :( But I will provide updates on that in full in another post later today.

Right now, I wanna finish off Stat arb since its long overdue and then maybe start working on a new project (a poker bot??) and study for a QAM project ( I need to learn some basic optimisation for it).

Okay lets get this shit done today.

Turns out my VSCode environment keeps giving me this ‘can’t import module’ issue and I usually avoided using it but I decided to google this issue and solve it once and for all. So apparently anaconda uses its own modules to keep things clean from the global python on the computer. I just changed my base language on VSCode to the one with ‘conda’ and shit worked.

I’m just gonna give a small review of what I did up until now since its been almost a month lol.

So I picked the top 25 stocks on the S&P 500, created a data frame with close prices. Updated the date to last Friday’s date and turns out its actually been a month since I last updated this code lmao sad.

dropped the na values from the df

used train_test_split from the sklearn package to split this price data into test and train with 50-50 split.

I find the return on train using pct.change() and create a correlation matrix using seaborn heat maps: sns.heatmap with method set to Pearson to give me the Pearson correlation.

We then create a function that uses coint from statsmodels.tsa.stattools to run a cointigration test which returns the statistic and pvalue. The null is that the cointegration is zero? So if we reject the null, i.e value < 0.05 we append the list pairs.

I then create a heat map with pvalues. The lower the value, the better since this indicates a positive cointegration. (reject that cointegration is zero)

Actually what is the difference between correlation and cointegration?

here's what I found after googling this up

correlation: measures how two series move in relation with each other: -1 indicating perfectly negative correlation, 0 indicating zero correlation and 1 indicating perfectly positive correlation.

Cointegration tests whether the difference between the means of two series remains constant or not. the blog actually says that this is better tested on prices instead of returns (huh funny?) but lets see what our results are.

We now pick two stocks. I originally chose MSFT and GOOGL but turns out the pvalue increased tremendously. Just an example of why I need discipline to be good at this lol. Lets take PYPL and GOOGL this time. generate the time series after normalising the prices.

I then run an OLS regression on the price data for the two stocks. The slope coefficient (beta_1) is the hedge ratio.

we then calculate the spread (not sure what this is actually?)

and then we use the AD Fuller test to check if the series is stationary. my critical value is lower than the CV for 1% so that means that my series is stationary. yuss leggo.

We now generate a z-score (which is just normalising the series), and then generate a trading signal from the z-score. this will tell us how far a price is from the population mean value. if it is positive, then the stock is overpriced (so you short sell) and if it is negative, the stock is underpriced (so you buy). I implemented this using np.select, if z-score is greater than upper limit, put -1 for short sell and if z-score is lower than lower limit then put 1 for buy.

Then we take the first order difference to get the position in the stock. I also create a second signal which is just the opposite of the first signal (negative signal1).

So signals.positions1 indicates the positions in asset1, where if they are +1 we buy, if they are -1 we short.

Similarly, positions2 indicates positions in asset2

I just used a graph and created the markers on the price graph to illustrate where I would take what position. That looks gooood. I think that's all on the stat arb project for now. Let’s work on something new tomorrow!

-I

0 notes

evanvanness · 5 years ago

Text

Annotated edition, Week in Ethereum News, March 15 issue

The number of EthCC attendees (for the record, most people I talk to now think the afterparty was the main spreading event) testing positive since I published the newsletter, even while many can’t get tested. So no caffeine or beer for me just in case I’m affected (though I left the afterparty very early), and that lack of caffeine is pulling me down just a little. This might be a low-energy, “please clap” Jeb annotated issue.

Eth1

Overlay method for hex to binary tree conversion

A summary of the post-EthCC Stateless Eth meetings. Renewed focus on sync, particularly getNodeData

A writeup post-stateless ETH summit after ethCC as well as a summary. Quiet times usually follow productive meetings, hence only 2 bullet points this week.

Eth2

Latest Eth2 call. Notes from Ben and Mamy. Phase 1 prototyping coming soon

Latest phase0 spec v0.11, the target for stable multi-client testnet

Ben Edgington’s notes from networking call

Nimbus client update – interop this month, discussion around constraints of running eth2 client on mobile devices

Two phase2 ethresearch posts: Appraisal of Non-sequential Receipt Cross-shard Transactions and Atomic Cross Shard Function Calls using System Events, Live Parameter Checking, & Contract Locking

Vitalik’s Using polynomial commitments to replace state roots, though this is not likely to hit the current roadmap. More context from listening to Justin Drake and Vitalik Buterin on Zero Knowledge

So my current estimate (completely my own) is that we’re likely looking at late q2 for phase0 launch. But who knows, maybe getting locked down will provide a small speedup? <wry grin>

I continue to think that by far the most important thing after shipping phase 0 is turning off proof of work. Stop wasting electricity! Cut issuance!

Stuff for developers

Solidity v0.6.4

A storage layout for proxy contracts taking advantage of Solidity v0.6.4

EthGlobal’s survey of Eth developers

10x smaller Javascript signer/verifier

Interacting with Ethereum using a shell through Incubed ultra-light client

Groth16 bellman proof verifier

Templates with pre-filled contract ABIs, addresses and subgraphs for Aave, Compound, Sablier, Uniswap

Prysmatic’s service registry pattern in Go

Implementing Merkle Trees and Patricia Tries in Node.js

Pipline onchain interpreted language vid

Austin Griffith vid on wallet module for eth.build

OpenZeppelin points out that a malicious deployer can backdoor your Gnosis Safe

SmartBugs: framework for executing Solidity automated analysis tools, with an academic paper comparing tool performance

I probably should’ve added that your Gnosis Safe is always safe if you used the official front end of the mobile app.

Crypto carnage, Maker liquidations

Thursday’s global selloff of risk assets led to the most negative price action day of crypto’s short history. The selloff inflated gas prices (~200 gwei) which caused trouble for Maker. The Maker oracles stopped working for an hour or two.

Maker liquidation auctions went off for nearly 0 DAI as bots bidding on those auctions got caught in high gas prices and ran out of DAI, leading several different bot maintainers to make ~8m USD in ETH by bidding just above zero in a few disparate time periods.

As a result, the Maker system surplus became a 5.7m Dai deficit (as of the time of publication). To improve incentives, Maker governance changed some parameters and to recoup the debt MKR will be auctioned onchain for lots of 50,000 Dai on the morning (UTC) of March 19th.

Community members have started a backstop to ensure the deficit is covered

Here is a writeup of the Maker liquidations with data and graphs

Just published: Maker governance proposal to change DSR to 0 and Stability Fee to 0.5%, GSM to 4 hours, and a decentralized circuitbreaker for auctions

An interesting thing I just learned is that Maker’s standard keeper apparently only works in Parity, not with Geth or Infura. So that’s another ramification of the Kovan/Rinkeby split, and getting Maker to use Kovan.

In the meantime, USDC has been added as a collateral. It’s rather strange but USDC perhaps makes sense as a way to mint DAI in times of stress and get closer to the peg. Seems like the Stability Fee should be set high here though, as you really only want people using it in times of needing Dai, eg in auctions. Right now it’s 20%, i’m not sure that’s as high as it should be.

This newsletter doesn’t often mention price and market-related matters. But it’s quite clear that crypto is not a safe haven in crisis. Could it be in the future? Perhaps, but all the hedge funds and institutional money simply exacerbate volatility. Where we’re at is that when people wanted to take risk off the table, they viewed crypto as a risk asset - and Bitcoin got hit the hardest because it had survived the best in crypto winter, despite there being no reason whatsoever for it to have done the best.

Ecosystem

Prysmatic’s Raul Jordan: Eth2 is happening, it is shipping, and we’re going to make it a reality no matter what

EthIndia’s online hackathon winners

DuneAnalytic’s stats for smart contract wallets

4GB DAG size and potential hashrate impact

So far, 9 attendees of EthCC have tested positive for COVID-19

A fun parlor game: what will be the next big ETH event? Devcon? Or something before, or something after? I think we’re going to see a lot more online hackathons - and probably more sponsorship dollars for them. Perhaps more sponsorship fiat for newsletter subscriptions too?

Raul’s post on eth2 was the most clicked of the week.

Enterprise

End to end transport layer security with Hyperledger Besu v1.4

DAML now available on Besu

Paul Brody talks Baseline Protocol on Into the Ether

How Citi and ConsenSys use Ethereum for commodities trade finance

Nice komgo writeup. Also interesting to see that the bet of Besu seems to be paying off with enterprise privatechain stuff like DAML even on Besu.

Governance, DAOs, and standards

Livepeer’s proposed governance roadmap

SingularDTV announces snglsDAO Foundation for their media protocol press release

Aragon removes AGP voting for ANT holders

What DAOs can learn from the Swedish Pirate Party

How to quickly create your own DAOstack DAO

FakerDAO – pool your MKR to sell votes to highest bidder

Governance as a whole has probably been one of Ethereum’s weak points. Not as bad as governance-by-Blockstream, but still not great. People don’t turn out to vote so direct voting doesn’t work (to wit, Aragon removing voting which was the only use for ANT) - and yet one of the solutions for people not voting actually penalizes people for voting, as I’ve found out in DxDAO. I’m hopeful for some of the solutions but to date long-term governance of everything is mostly an unsolved issue.

Application layer

Numerai’s ErasureBay live on mainnet. A marketplace for any kind of information, where the buyer can slash the seller if they don’t like the information

DeFiSaver’s 1click transaction CDP closing using flashloans

Gnosis’ Gibraltar-regulated Sight political markets are live

Update on Augur v2. tldr: it’s close

Balancer’s code is open source

bZx’s mea culpa post mortem of the attacks. They also paid 1inch the full bug bounty two weeks ago.

Bluestone fixed rate loans and deposits, live on Rinkeby testnet

Maker’s Dai Gaming Initiative

VirtuePoker’s final beta launches March 16th

HavenSocial, a web3 alternative to Facebook where you own your own data

Nice to see people are still trying to build social media alternatives. The idea of building a better Facebook is definitely an enthralling one - yet not one that Ethereum has even come close to delivering.

Same with games - we’ve been talking about tokens/NFTs on ETH being a big thing in games for awhile. Nothing has quite hit it (let’s be honest, CryptoKitties was just a different flavor of ICO mania) but I think Skyweaver might.

My usual ex-post metric of seeing how much of this section is DeFi: 10 bullet points, depending on how you count you could say it’s 4 to ~8.

Tokens/Business/Regulation

David Hoffman: Ethereum as emergent structure

USDC: programmable dollars with business accounts and APIs

Uniswap volume is now tracked on Coinmarketcap

wBTC passes Lightning Network in value locked up

Matthew Green: US congressional bill EARN IT is a direct attack on e2e encryption

Mass panic like with Corona is always a perfect moment to add bills on as riders to must-pass bills, so look for anti-encryption hawks to try to do this in the name of “safety.” Maybe even to bailout bills.

Kinda interesting to see CMC finally add Uniswap volume. They’ve been quite slow to add dexes generally; it seems like Bitcoiners often have a hard time adjusting to decentralization when they’ve been used to all the centralized BTC tradeoffs.

And Circle is now all-in on USDC. From Santander prototype at Devcon2 to $600m now printed, and this doesn’t even count Tether belatedly realizing that BTC was a terrible choice to secure Tether.

General

Contribute computing cycles to fight COVID-19

Stay private in DeFi with email

Brave’s nightly release features random browser fingerprints per session

Load Value Injection attack on Intel SGX

Jacobians of hyperelliptic curves explainer from Alan Szepieniec

Ryan Sean Adams’ “how to” on using ProtonMail or equivalent is the 2nd most clicked, showing how he’s one of the most important people in Ethereum right now. He takes concepts them and popularizes them.

The random browser fingerprints is huge, and a big step up in privacy.

Meanwhile if you have 2gb or 3gb GPUs, you can fold some proteins which may have an impact on COVID-19. I’m always skeptical, but it seems likely to be worth the cost. Especially if you’re like me and get super cheap electricity in Texas through GridPlus! Crypto is not cancelled in Texas.

0 notes

gilbertineonfr2 · 8 years ago

Text

Botconf 2017 Wrap-Up Day #2

I’m just back from the social event that was organized at the aquarium Mare Nostrum. A very nice place full of threats as you can see in the picture above. Here is my wrap-up for the second day.

The first batch of talks started with “KNIGHTCRAWLER, Discovering Watering-holes for Fun, Nothing” presented by Félix Aimé. This is Félix’s personal project that he started in 2016 to get his own threat intelligence platform. He started with some facts like the definition of a watering hole: it is the insertion of specific malicious scripts on a specific website to infect visitors. Usually, Javascript + iframe that redirect to the malicious server but it can also be a malvertising campaign (via banners). They are not easy to track because, on the malicious server, you can have protections like IP whitelists (in case of targeted attack or to keep researchers away), browser fingerprinting, etc. Then he explained how he build his own platform and the technique used to find suspicious activities: passive DNS, common crawl indexes, directory scraping, leaked DNS, … It is interesting to note that he uses YARA rules. In fact, he created his personal (legal) botnet. The architecture is based on a master server (the C&C) which is talking to crawler servers. Actually, he’s monitoring 25K targets. This is an ongoing project and Félix will still improve it. Not that it is not publicly available. He also gave some nice examples of findings like the keylogger on WordPress that we reported yesterday. He detected it for the first time a few months ago he told me! Very nice project!

The second talk was a complete review of the Wannacry attack that hits many organizations in May 2017: “The (makes me) Wannacry Investigation” presented by Alan Neville from Symantec. This is the last time that the SANS ISC InfoCON was raised to yellow! Everybody remembers this bad story. Alan reviewed some major virus infections during the last years like Blaster (2003) or Conficker (2008). These malware infected millions of computers but, in the case of Wannacry, “only” 300K hosts were infected. But, the impact was much more important: factories, ATM’s, billboards, health devices, etc. Then Alan reviewed some technical aspect of Wannacry and mentioned, of course, the famous kill-switch domain: iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea[.]com. In fact, Symantec detected an early version of the ransomware a few months before (without the Eternal Blue exploit). They also observed some attacks in March/April 2017. But, basics security rules could have reduced the impact of the ransomware: have a proper patching procedure as well as backup/restore procedures.

After the morning coffee refill, Maria Jose Erquiaga came on stage to present: “Malware Uncertainty Principle: an Alteration of Malware Behavior by Close Observation“. This talk was a presentation of the study of the influence of web TLS interception in malware analysis. Indeed, today, more and more malwares are communicating on top of HTTPS. What will happen if we play MitM with them to intercept communications with the C&C server? Maria explained the lab that was deployed with two scenarios: with and without an intercepting proxy.

Once the project in place, they analyzed many samples and captured all the traffic. The result of this research is available online (link). What did they find? Sometimes, there is no communication at all with the C&C because the malware is using a custom protocol via TCP/443. This one is rejected by the proxy. Some malwares tried to reconnect continuously or seek another way to connect (ex: via different ports).

The next one was “Knock Knock… Who’s there? admin admin, Get In! An Overview of the CMS Brute-Forcing Malware Landscape” presented by Anna Shirokova from Cisco. This talk was presented at BruCON but, being part of the organization, I was not able to follow it. Hopefully, this time was the right one. I’m maintaining multiple WordPress sites and, I fully agree, brute-force attacks are constantly launched and pollute my logs. Anna started with a review of the brute-force attacks and the targets. Did you know that ~5% of the Internet websites are running WordPress? This is a de-facto target. There are two types of brute-force attacks: the vertical one (a list of passwords is tested against one target) and horizontal (one password is tested against a list of targets). Brute-force attacks are not new, Anna made a quick recap from 2009 until 2015 with nice names like FortDisco, Mayhem, CMS Catcher, Troldesh, etc. And it’s still increasing… Then Anna focuses on Sathurbot which is a modular botnet with different features: downloader, web crawler and brute-forcer). The crawler module uses search engines to find a list of sites to be targeted (ex: “bing.com/search?q=makers%20manage%20manual“). Then the brute-force attack starts against /wp-login.php. Nice research which revealed that the same technique is always used and that many WordPress instances are still using weak passwords! Note that it is difficult to measure the success rate of those brute-force attacks).

Then Mayank Dhiman & Will Glazier presented “Automation Attacks at Scale or Understanding ‘Credential Exploitation’“. There exists many tools to steal credentials on the Internet and others to re-use them to perform malicious activities (account takeover, fake accounts creation, shopping bots, API abuse, etc). They are many toolkits that were briefly reviewed: SentryMBA, Fraudfox, AntiDetect but also more classic tools like Hydra, curl, wget, Selenium, PhantomJS. The black market is full of services that offers configuration files for popular websites. According to the research, 10% of the Alexia top websites are a config file available on the black market (which describes how to abuse them, the API, etc). Top targets are gaming websites, entertainment and e-commerce. No surprise here. To abuse them, you need: a config file, stolen credentials and some IP addresses (for rotation) and some computing power. About credentials, they are quite easy to find, pastebin.com is your best friend. Note that they need good IP addresses, best sources are cloud services or compromised IoT devices or proxy farms. They gave a case study about the large US retailer that was targeted by 40K IP addresses from 61 countries. But how to protect organizations against this kind of attacks?

Analyze HTTP(S) requests and headers to fingerprint attack tools

Use machine learning to detect forged browser behaviour

Use threat intelligence

Data analytics (look for patterns)

The next one was “The Good, the Bad, the Ugly: Handling the Lazarus Incident in Poland” presented by Maciej Kotowicz. Maciej came back on a big targeted attack that occurred in Poland. This talk was flagged as TLP:AMBER. Sorry, no coverage. If you are interested, here is a link for more info about Lazarus.

After the (delicious) lunch, Daniel Plohmann presented his project: “Malpedia: A Collaborative Effort to Inventorize the Malware Landscape“. Malpedia can be resumed in a few words: Free, independent, resource labeled, unpacked, samples. The idea of Malpedia came two years ago during Botconf. The idea is to propose a high-quality repository of malware samples (Daniel insisted on the fact that quality is better than quantity) properly analyzed and tagged. Current solutions (botnets.fr, theZoo, VirusBay.io) still have issues to identify properly the samples. In the Daniel’s project, samples are classified by families. What is a malware family? According to Daniel, it’s all samples that belong to the same project seen from a developer’s point of view. After explaining the collection process, he gave some interesting stats based on his current collection (as of today, 2491 samples from 669 families). Nice project and access is available upon request (if you met Daniel IRL) or by vouching for other people. Malpedia is available here.

The next talk was… hard! When the speaker warns you that some slides will contain lot of assembler code, you know what to expect! “YANT – Yet Another Nymaim Talk” was presented by Sebastian Eschweiler. What I was able to follow: Nymain is a malware that uses very complex anti-analysis techniques to defeat researchers and analysts. The main technique used is called “Heaven’s Gate“. It is a mechanism to call directly 64-bits kernel core from 32-bit code. It is very useful to encrypt code, hide from static analysis tools and a nice way to evade sandbox hooks.

After the afternoon coffee break, Amir Asiaee presented “Augmented Intelligence to Scale Humans Fighting Botnets“. It started with a fact: today, they are too many malwares and too few researchers. So we need to automate as much as possible. Amir is working for a company that gets feeds of DNS request from multiple ISP’s. They get 100B of DNS queries per day! As the malwares are moving faster then yesterday, they use complex DGA, the lifetime of C&C is shorter, there is a clear need for quick analysis of all those data. Amir explained how they process this huge amount of data using NLP (“Natural Language Processing”).

The engineering challenge is to process all those data and to spot new core domain… when real tile is a key! Here is a cool video about the data processing. Then Amir explained some use cases. Two interesting examples: Bedep uses exchange rates as DGA seed… Some others have too much coalitions (ex: [a-z]{6}.com) which could lead to many false positives: what about akamai.com?

The last talk covered the Stantinko botnet: “Stantinko: a Massive Adware Campaign Operating Covertly since 2012” by Matthieu FAOU & Frédéric Vachon from Eset. It was a very nice review of the botnet. It started with some samples they received from a customer. They started the reverse engineering and, when you discover that a DLL, belonging to a MP3 encoder application, decrypts and load another one in memory, you are facing something very suspicious! They were able to sinkhole the C&C server and started further analysis. What about the persistence? The malware creates two Windows services: PDS (Plugin Downloader Service) and BEDS (Browser Extension Downloader Service).

The purpose of the PDS is to compromise CMS (WordPress and Joomla), install a RAT and Facebook bot. The BEDS is a flexible plugins system to install malicious extensions in the browser. Stantinko has many interesting anti-analysis features: the code is encrypted with a unique key per infection. The analyze requires to find the dropper and aget a sample + related context. There is a fileless plugin system. To get payloads, they had to code a bot mimicking an infected machine. What about the browser extension? The Ad-Fraud injects ads on targeted websites or redirect the user to an ads websites before showing the right one. They also replace ads with their own. Note that URL’s are hashed in the config files! Another module is the search parser which search on Google or Yandex for potential victims to perform brute-force attacks. Finally, a RAT module is also available. This bot has a estimate size of 500K hosts. More details about Stantinko are available here.

The day ended with a good lightning talks sessions: 14 presentations in 1h! Some of them were really interesting, others very funny. In bulk mode, what was presented:

The Onyphe project

IoT Malware classification

Dropper analysis (https://malware.sekoia.fr)

Deft Linux (Free DFIR Linux distribution) DART deftlinux.net

Sysmon FTW

PyOnyphe: Onyphe Python library to use the API

Autopwn

Just a normal phishing

Context enrichment for IR

Yet another sandbox evation “you_got_damn_right” HTTP header gist.github.com/bcse/1834878

Sysmon sigs for Linux honeypots

Malware config dynamic extraction (Gootkit)

IDA Appcall

A Knightcrawler demo (see above)

See you tomorrow for the last day!

[The post Botconf 2017 Wrap-Up Day #2 has been first published on /dev/random]

from Xavier

#Botconf 2017 Wrap-Up Day 2

0 notes